feat: adds failure domain api for AWS and EKS #1347

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

faiq wants to merge 3 commits into main from faiq/adds-failure-domain-handler

Contributor

faiq commented Oct 10, 2025

**What problem does this PR solve?**:
adds failure domians api to AWS/EKS Machine deployments

Which issue(s) this PR fixes:
https://jira.nutanix.com/browse/NCN-110350

How Has This Been Tested?:

Special notes for your reviewer:

faiq added 2 commits

October 10, 2025 13:27


          feat: adds failure domain api for AWS and EKS

e5b5cb1


          docs: adds failure domain docs

bd11976

github-actions bot added the feature label

faiq requested review from dkoshkin, jimmidyson and supershal

October 10, 2025 20:38

dkoshkin approved these changes

View reviewed changes

docs/content/customization/aws/failure-domain.md

    
                    - name: workerConfig

                      value:

                        aws:

                          failureDomain: us-west-2a

Contributor

dkoshkin Oct 13, 2025

Were you able to verify this manually for EKS? From the docs it uses failureDomain: "1"
https://cluster-api-aws.sigs.k8s.io/topics/failure-domains/worker-nodes#failure-domains-in-worker-nodes

But in the CAPA code it does look like it should be the AZ like you have it documented

api/v1alpha1/aws_node_types.go Outdated Show resolved Hide resolved


          fix: comment on field

bf6bae6

Co-authored-by: Dimitri Koshkin <[email protected]>

supershal reviewed

View reviewed changes

docs/content/customization/eks/failure-domain.md

    
            @@ -0,0 +1,73 @@
          
              +++

              title = "AWS Failure Domain"

Contributor

supershal Oct 21, 2025

Suggested change

      
            title = "AWS Failure Domain"
          
            title = "EKS Failure Domain"

supershal reviewed

View reviewed changes

docs/content/customization/eks/failure-domain.md

    
                  variables:

                    - name: workerConfig

                      value:

                        aws:

Contributor

supershal Oct 21, 2025

Suggested change

supershal reviewed

View reviewed changes

docs/content/customization/eks/failure-domain.md

    
                          overrides:

                            - name: workerConfig

                              value:

                                aws:

Contributor

supershal Oct 21, 2025

Suggested change

supershal reviewed

View reviewed changes

docs/content/customization/eks/failure-domain.md

    
                          overrides:

                            - name: workerConfig

                              value:

                                aws:

Contributor

supershal Oct 21, 2025

Suggested change

supershal reviewed

View reviewed changes

docs/content/customization/eks/failure-domain.md

    
                          overrides:

                            - name: workerConfig

                              value:

                                aws:

Contributor

supershal Oct 21, 2025

Suggested change

supershal reviewed

View reviewed changes

pkg/handlers/aws/mutation/failuredomain/inject_worker.go

Comment on lines +84 to +99

    
              	if obj.GetKind() != "MachineDeployment" || obj.GetAPIVersion() != clusterv1.GroupVersion.String() {

              		log.V(5).Info("not a MachineDeployment, skipping")

              		return nil

              	}

              	log.WithValues(

              		"patchedObjectKind", obj.GetKind(),

              		"patchedObjectName", client.ObjectKeyFromObject(obj),

              	).Info("setting failure domain in worker MachineDeployment spec")

              	if err := unstructured.SetNestedField(

              		obj.Object,

              		failureDomainVar,

              		"spec", "template", "spec", "failureDomain",

              	); err != nil {

              		return err

Contributor

supershal Oct 21, 2025

We should be using patches.MutateIfApplicable instead for consistency between all the handlers and use concrete types instead of unstructured

supershal reviewed

View reviewed changes

pkg/handlers/aws/mutation/failuredomain/inject_worker.go

    
              	variableFieldPath []string

              }

              func NewWorkerPatch() *awsFailureDomainWorkerPatchHandler {

Contributor

supershal Oct 21, 2025

We should add failure domain patch for AWS control plane too.

supershal requested changes

View reviewed changes

Contributor

supershal left a comment

suggested few changes.

yanhua121 requested changes

View reviewed changes

Contributor

yanhua121 left a comment

I think this is similar to the topic that we discussed on where to put the worker nodePool failureDomain reference. For the worker nodePool MachineDeployment, we can take advantage of MD's existing "spec.failureDomain" field (applies to all providers since MD is from CAPI), instead of configuring it via individual provider's cluster.spec.topology.workerConfig and use the injector to MD's spec.failureDomain. We can discuss on this tomorrow if you disagree.

Refer to the thread discussion on where to put the failureDomain reference for worker nodePool for Nutanix CAREN configuration.

dkoshkin self-requested a review

October 30, 2025 18:46

faiq closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels